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The expansion of the genetic code with non-canonical amino acids (ncAA) enables 
the chemical and biophysical properties of proteins to be tailored, inside cells, with 
a previously unattainable level of precision. A wide range of ncAA with functions not 
found in canonical amino acids have been genetically encoded in recent years and have 
delivered insights into biological processes that would be difficult to access with traditional 
approaches of molecular biology. A major field for the development and application of novel 
ncAA-functions has been transcription and its regulation. This is particularly attractive, 
since advanced DNA sequencing- and proteomics-techniques continue to deliver vast 
information on these processes on a global level, but complementing methodologies to 
study them on a detailed, molecular level and in living cells have been comparably scarce. 
In a growing number of studies, genetic code expansion has now been applied to precisely 
control the chemical properties of transcription factors, RNA polymerases and histones, 
and this has enabled new insights into their interactions, conformational changes, cellular 
localizations and the functional roles of posttranslational modifications. 

Keywords: genetic code expansion, non-canonical amino acids, transcription, epigenetics, protein, nucleic acid 
interactions 



INTRODUCTION 

Genetic code expansion has become an important tool to 
study biological processes both in vitro and in living cells. 
This approach relies on heterologous pairs of aminoacyl-tRNA- 
synthetases (aaRS) and tRNAs that enable the co-translational 
incorporation of non-canonical amino acids (ncAA) in a host 
organism, in response to unique non-sense codons such as the 
amber codon, UAG (Liu and Schultz, 2010). The selectivity of 
the incorporation process is tightly controlled at several steps, in 
order to maintain the integrity of the information transfer from 
the gene to the encoded protein: the heterologous aaRS and tRNA 
build a functional pair, i.e., the aaRS can aminoacylate the tRNA 
and the aminoacylated tRNA is compatible with the downstream 
translation components of the host, such as elongation factors 
and the ribosome. However, it is also orthogonal, i.e. the tRNA is 
not a substrate of the hosts aaRS and the hosts tRNAs are not sub- 
strates of the heterologous aaRS. Moreover, the ncAA itself must 
not be a substrate for host aaRS and has to be cell-permeable, 
non-toxic and metabolically stable. 

The initial discoveries of several orthogonal tRNA/aaRS pairs 
and their extensive re-engineering for the selective processing of 
novel ncAA have provided a comprehensive toolbox with diverse 
functions for biological studies. Furthermore, methodological 
advancements, such as the development of tRNA/aaRS expression 
strategies for an increasing range of organisms (Young et al., 2009; 
Wang et al, 2010a; Greiss and Chin, 2011; Bianco et al., 2012; 
Parrish et al, 2012; Kang et al, 2013; Li et al., 2013), ribosome 
engineering (Wang et al, 2007b; Neumann et al., 2010) and the 
development of strains with improved non-sense codon suppres- 
sion efficiencies (Ryden and Isaksson, 1984; Mukai et al., 2010; 



Johnson et al, 2011; Lajoie et al, 2013; Wu et al, 2013) now 
enable the efficient, co-translational incorporation of ncAA in a 
wide range of (multicellular) organisms, also including multiple 
different ncAA in response to individual codons. For a detailed 
introduction into this topic, we refer to a recent excellent review 
(Liu and Schultz, 2010). 

GENETICALLY ENCODED CHEMISTRIES FOR STUDYING 
REGULATORY PROCESSES OF TRANSCRIPTION 

The introduction of ncAA with novel chemical or biophysi- 
cal functions by this strategy is not only a particularly simple 
way to chemically modify proteins for in vitro studies, but also 
allows to study proteins directly in living cells, with minimal per- 
turbation of their structure and their natural environment. A 
research field that has especially benefitted from these advantages 
is transcription and its regulation. Here, genetic code expansion 
has a considerable potential to even out the current imbal- 
ance between the wide availability and massive information out- 
put of discovery-oriented screening techniques on one side and 
the limited availability of methods for studying transcriptional 
mechanisms at the detailed molecular level on the other side. 
In particular, the diverse assay formats utilizing high through- 
put sequencing have provided comprehensive knowledge of the 
general transcriptional activity of the genome (Djebali et al., 
2012) and its association state with transcription factors, chro- 
matin remodeling complexes or RNA polymerase components 
(Bernstein et al., 2012; Neph et al., 2012). Moreover, chromatin 
accessibility (Thurman et al, 2012) and genomic distributions of 
distinct posttranslational modifications (PTM) in histones have 
been precisely mapped (Bernstein et al., 2012). Finally, the recent 
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discoveries of new epigenetic DNA modifications and their asso- 
ciated proteins (Munzel et al., 2011; Song et al., 2012) as well as 
of new histone PTMs with unknown roles (Du et al., 2011; Tan 
et al, 201 1; Zhang et al, 201 lb; Olsen, 2012) have added an addi- 
tional layer of complexity to the current picture of transcription 
regulation. This vast amount of findings brings up a multitude of 
questions, many of which can only be answered by a methodol- 
ogy that allows to more precisely control the chemical properties 
of the involved proteins by the site-specific installment of ncAA: 
How can the stoichiometry and topology of transcriptional com- 
plexes be accurately assessed in their natural environment? What 
mechanisms underlie the control of the local concentrations of 
transcription factors at their genomic target sites? How do indi- 
vidual PTMs in histones or transcription factors control the 
recognition of their cognate DNA elements or other proteins? 

Here, we review recent studies that utilized ncAA and provided 
first answers in this direction. These demonstrate the considerable 
potential of genetic code expansion to study fundamental proper- 
ties of proteins, such as their interactions with other proteins or 
nucleic acids, transport mechanisms or conformational dynam- 
ics (Figure 1). Though generally a wide range of ncAA were used 
for the above mentioned studies, we focus on photocrosslinkers, 
photoactivatable ncAA and ncAA with defined PTMs (Scheme 1 ), 
since these have had the largest impact in the transcriptional field. 

PH0T0CR0SSLINKING 

Key to the understanding of transcriptional complexes is to define 
their stoichiometry, topology and conformational dynamics by 
identifying protein interaction partners and mapping their con- 
tact surfaces. This can be a difficult task, since the involved 
interactions may be weak and transient and ideally have to be 
identified in vivo. ncAA with the ability to form covalent bonds to 
nearby molecules upon irradiation with light have become widely 
used tools for this purpose. Examples of such photocrosslinking 
chemistries, already successfully genetically encoded, are based 
on aryl azides (Chin et al, 2002b), diazirines (Tippmann et al, 
2007; Ai et al, 2011; Chou et al, 2011; Lin et al, 2011; Zhang 
et al., 201 la; Yanagisawa et al., 2012; Chatterjee et al, 2013), ben- 
zophenones (Chin et al., 2002a, 2003; Lacey et al., 2013) and 



furan (Schmidt and Summerer, 2013). Aryl azides, diazirines, and 
benzophenone form highly reactive intermediates upon irradia- 
tion with UV light that subsequently react rather unselectively 
with nearby molecules. Among these, diazirines and benzophe- 
nones are most widely used. While diazirines are the function- 
ality for which the largest variety of structurally distinct ncAA 
is available and that can be encoded in the largest variety of 
hosts, including several pathogenic bacteria (Lin et al., 2011; 
Zhang et al, 2011a), only the benzophenone-containing amino 
acid 1 (BpA, Scheme 1) has however been used to study tran- 
scriptional proteins. In pioneering studies by the Schultz group, 
ncAA 1 was genetically encoded using a Methanococcus jan- 
naschii tRNA Tyr /TyrRS pair with an evolved TyrRS mutant in 
E. coli (Chin et al., 2002a) and an aaRS mutant of an E. coli 
tRNA Tyr /TyrRS pair (Chin et al, 2003) in eukaryotes. Later, the 
use of a Methanosarcina mazei tRNA Pyl /PylRS pair was reported 
(Lacey et al., 2013). Excited benzophenone preferentially reacts 
with otherwise unactivated C-H bonds via CH-insertion (Galardy 
et al., 1973). An advantage is that benzophenone does not pho- 
todissociate and that its excited triplet state readily relaxes in the 
absence of available reaction partners. This reversibility allows 
repeated excitation and high crosslinking yields (Kauer et al, 
1986). A disadvantage, however, is the relatively large size and 
conformational rigidity of 1, which could perturb the protein 
interaction surface under study. 

In the only photocrosslinking study so far that utilized ncAA 
for mapping protein-DNA interactions, 1 was used to characterize 
the binding of the well-known E. coli catabolite activator pro- 
tein (CAP), a transcriptional activator that regulates catabolite- 
sensitive operons by binding to DNA in presence of cAMP as 
allosteric effector (Lee et al, 2009). 

However, 1 has been more widely used for the studying of 
protein-protein interactions involved in transcription. For exam- 
ple, 1 has provided a deeper understanding of the interplay 
between transcriptional activators and co-activators. This is of 
particular interest, since these interactions often are transient and 
have only moderate affinity, which makes them difficult targets 
for interaction studies (Melcher, 2000; Mapp and Ansari, 2007; 
Fuxreiter et al, 2008). Following proof-of-principle experiments 
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FIGURE 1 | Overview of the opportunities that have been opened by the use of genetic code expansion in the studying of various aspects of 
transcription and its regulation. DNA is shown in gray, RNA in cyan. 
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SCHEME 1 | Structures of non-canonical amino acids (ncAA) used in 
the reviewed studies. 1 : for inorporation of ncAA 6, different Ne-protected 
precursors were genetically encoded followed by posttranslational 
deprotection in vitro (see also Scheme 2A). 2 : for incorporation of ncAA 7, 
Ne-Boc-L-lysine was genetically encoded and dimethylation was achieved 
posttranslationally after deprotection in vitro (see also (Scheme 2B). 3 : 
ncAA 10-13 were incorporated into proteins by incorporation of 
selenocysteine-derivatives, oxidative elimination to dehydroalanine 20 and 
subsequent michael additions with thiols (see also (Scheme 2C). 



to detect and characterize the interaction between Gal4 and its 
(high affinity) suppressor Gal80 in S. cerevisiae (Majmudar et al., 
2009a), a series of studies was dedicated to uncover individ- 
ual binding sites of several transcriptional activators with the 
co-activator Medl5 and the chromatin-modifying coactivator 



complex Swi/Snf2 (Majmudar et al., 2009b; Krishnamurthy et al., 
2011). Swi/Snf2 is a multimeric complex that uses a DNA- 
stimulated ATPase activity for nucleosome remodeling and plays 
important roles in tumor suppression (Wilson and Roberts, 
2011). It has been proposed to be a direct binding partner of 
transcriptional activators such as the prototypical activator VP16, 
but the involved Swi/Snf2 subunits and exact interaction mode 
in vivo has been unclear (Figure 2A). Crosslinking studies using 
1 incorporated at different sites in either the N- or C-terminal 
region of the VP 16 transcriptional activation domain (TAD) tar- 
geting the three subunits Swil, Snf2 and Snf5 revealed the ATPase 
Snf2 as a direct binding partner of the VP16 C-terminal domain. 
This interaction did also occur with the activator Gal4, suggesting 
a more general mechanism of activation (Krishnamurthy et al., 
2011). 

In a further study, the interactions of the S. cerevisiae TATA- 
box binding protein (TBP) were mapped in vivo and in isolated 
transcription pre-initiation complexes (Mohibullah and Hahn, 
2008). These revealed direct interactions with the general tran- 
scription factor TFIIA and the Spt3 and Spt8 subunits of the 
multifunctional co-activator SAGA that is extensively involved 
in histone modification. The insights gained into the interac- 
tion with Spt3 by photocrosslinking provided a starting point for 
a detailed characterization by mutation studies targeting amino 
acids close to the incorporation sites of 1 that promoted effective 
crosslinking. Mutations at several sites resulted in a loss of affin- 
ity of TBP to SAGA and reduced transcription activation, which 
demonstrates the value of 1 to identify relevant interactions. 

Several studies have focused on the identification of regula- 
tory interaction partners of RNA polymerase and the definition of 
conformational changes in transcriptional complexes. For exam- 
ple, a combination of structural studies and photocrosslinking 
experiments with bacterial RNA polymerase in complex with 
Gfhl, an inhibitor of transcription initiation and elongation, 
revealed that a coiled-coil domain of Gfhl blocks the NTP entry 
channel and freezes the RNA polymerase in an unusual ratcheted 
conformation (Tagami et al., 2010). 

Two further studies extensively used 1 to study the topology 
of the RNA polymerase II transcription pre-initiation complex 
(PIC). Inserting 1 at multiple sites within the central cleft of 
RNA polymerase formed by the two large subunits Rpbl and 
Rpb2 revealed differential interactions of the two subunits with 
the general transcription factors TFIIB, TFIIF, and TFIIE (Chen 
et al., 2007). Moreover, a detailed study based on local foot- 
printing using the cysteine-reactive, hydroxyl radical-generating 
probe p-bromoacetamidobenzyl-EDTA-Fe(III) (Fe-BABE) and 
photocrosslinking with 1 provided insights into the structural 
arrangement and conformational dynamics of individual sub- 
units of the general transcription factors TFIIE, TFIIH and 
TFIIB at the central cleft of RNA polymerase II within the PIC 
(Grunberg et al., 2012). Specifically, 1 was used to map the inter- 
action surface between the TFIIE subunit Tfa2 and Ssl2, a subunit 
of TFIIH that contains RecA-like domains and is required for 
DNA opening and transition from the PIC to the open com- 
plex. 1 was introduced at several surface-exposed sites of the 
Ssl2 RecA-like domains and exhibited strong crosslinking ten- 
dency to Tfa2 (Figure 2B). Conversely, a number of positions 
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FIGURE 2 | Photocrosslinking studies with an expanded genetic code. 

(A) Interaction of the prototypical transcriptional activator VP16 with the 
nucleosome remodeling complex Swi/Snf. (B) Model showing part of the 
central cleft of RNA polymerase II in the transcription preinitiation 
complex bound to the general transcrption factors TFIIE and the Ssl2 
subunit of TFIIH. Three winged helix (WH) domains of TFIIE are shown 
in blue, magenta and dark brown, the Ssl2 subunit in light brown. RNA 
polymerase II is shown in gray, amino acids analyzed in photocrosslinking 



studies in orange. Adapted by permission from Macmillan Publishers Ltd: 
Nature Structural and Molecular Biology (Grunberg et al., 2012), copyright 
2012. (C) Red-light controlled Protein-RNA crosslinking using ncAA 2 
bearing a furan moiety. Top: activation of the furan by oxidation with 
singlet oxygen, resulting in a y-keto-enal. Middle: Proposed mechanism 
for the formation of a cyclic adduct between the y-keto-enal and 
cytosine. Lower part: Left: HIV-1 TAR. Right: Arginine-rich motif of HIV-1 
TAT and incorporation positions of furan-bearing ncAA 2. 



bearing 1 unexpectedly showed crosslinking to TFIIB, which con- 
tradicted the current PIC model. In this model, these positions 
were in a distance of >30A from TFIIB, suggesting the exis- 
tence of a second conformation of TFIIB. Taken together, these 
studies demonstrate the broad applicability of 1 to study protein 
interactions both in vitro and in vivo. 

Recently, a novel photocrosslinking chemistry based on furan 
had been genetically encoded that offers complementary prop- 
erties compared to the previously used chemistries (Figure 2C) 
(Schmidt and Summerer, 2013). In contrast to direct UV-light 
activation, this chemistry is indirectly activated by the in situ 
generation of singlet oxygen ('C^) and subsequent oxidation of 
the genetically encoded, furan-containing ncAA 2 (Scheme 1). 
Since l Oj can be generated by irradiation of photosensitiz- 
ers with red light, this approach should offer a high pene- 
tration depth in complex biological samples and the absence 
of nucleic acid damaging photoreactions associated with UV 
light. This chemistry had previously been described and thor- 
oughly characterized in the context of DNA-DNA interstrand 



crosslink formation and proceeds via a proposed mechanism 
involving a 2 + 4 cycloaddition of 'C^ to the furan, opening 
of the resulting ozonide, and ultimately formation of a y-keto- 
enal that can build cyclic adducts with A, G, and C nucleobases 
(Figure 2C) (Op De Beeck and Madder, 2011, 2012). Since this 
represents a rather nucleic acid-selective crosslinking chemistry 
that targets only certain nucleobases, it could provide more 
detailed information on the topology of protein-nucleic acid 
complexes, potentially including information about the bind- 
ing mode of protein motifs (backbone- or groove-interaction) 
and the pairing state of individual nucleobases in complexed 
RNA or DNA. This chemistry was used to map the interac- 
tion of the HIV-1 transactivator of transcription (TAT) and its 
trans-activating response element (TAR), an interaction that is 
ubiquitous to HIV-1 mRNA transcription (Figure 2C) (Schmidt 
and Summerer, 2013). NcAA 2 was introduced at positions within 
the TAR-binding, arginine-rich motif of TAT and reported dis- 
tinct orientations of the positions by differential crosslinking 
efficiencies. 
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PHOTOACTIVATION 

A second widely employed application of ncAA has been the use 
of photocaged canonical amino acids to control the function of 
a protein with high spatiotemporal resolution. This approach 
is applicable for single amino acid functions of a protein that 
can be masked by photocaging or if the photocaging group 
itself can be positioned in a way that it perturbs the func- 
tion of nearby amino acids/motifs, e.g., involved in catalytic 
activities or ligand recognition (Riggsbee and Deiters, 2010). 
Generally, photocaged versions of cysteine, tyrosine, serine and 
lysine have previously been genetically encoded, initially using o- 
nitrobenzyl groups that can be decaged with long-wave UV light 
(365 nm) (Liu and Schultz, 2010). Derivatives with improved 
photophysical and chemical properties were employed later, 
such as 4,5-dimethoxy-2-nitrobenzyl- or 4,5-dioxymethylene-2- 
nitrobenzyl- groups (Lemke et al., 2007; Gautier et al, 2010). 
These provide a bathochromic shift in absorption and enable 
the use of non-UV light for decaging. Additionally, alkylation 
of the benzylic methylene group has been a strategy to avoid 
the formation of reactive aldehydes as products of the photoly- 
sis reaction, thus further increasing the bio-orthogonality of the 
approach. 

Transcription factors 

In the generally first biological study that utilized ncAA in five 
eukaryotes, photocaged serine 3 (Scheme 1) was used to study 
a regulatory transport mechanism of the S. cerevisiae transcrip- 
tion factor Pho4 (Lemke et al, 2007). This ncAA was encoded 
in S. cerevisiae using an evolved LeuRS mutant of an E. coli 
tRNA Leu /LeuRS pair. Pho4 activates the expression of a number 
of genes in response to phosphate starvation. On the contrary 
in phosphate-rich conditions, Pho4 is phosphorylated at sev- 
eral serine sites by the cyclin-cyclin-dependent kinase complex 
Pho80/Pho85 and subsequently exported to the cytoplasm as a 
mechanism to downregulate its transcriptional activity (Komeili 
and O'Shea, 1999). In a Pho4-GFP fusion construct that exhib- 
ited regular transport behavior and activity, several critical serines 
were replaced by 3, resulting in non-phosphorylated Pho4, even 
when cells were grown in phosphate-rich conditions. This was 
previously also observed in studies employing alanine mutants 
of the individual serines, which allowed to dissect their individ- 
ual roles in nuclear export (Komeili and O'Shea, 1999). However, 
alanine mutants could only provide a static view on this highly 
dynamic process. In contrast, decaging of 3 by a millisecond laser 
pulse at 405 nm enabled phosphorylation and nuclear export with 
high spatiotemporal resolution. This allowed the direct, quantita- 
tive tracking of nuclear export and revealed distinct roles of the 
individual serines by differential export kinetics. 

A second study extended the use of photocaged ncAA to 
mammalian cells, using a complementing approach to control 
the localization of a transcription factor. Photocaged lysine 5 
(Scheme 1) was genetically encoded using an evolved M. bark- 
en PylRS mutant and used to mask a single lysine position in 
the nuclear localization sequence (NLS) of a fusion construct of 
the tumor suppressor p53 and GFP (Gautier et al, 2010; Lemke, 
2010). In contrast to the wild type construct showing nuclear 
localization, the caged construct exhibited the same phenotype 



as a corresponding alanine mutant and was localized in the cyto- 
plasm. Irradiation with a 5 s light pulse at 365 nm triggered 
nuclear import as result of decaging and restored a functional 
NLS. 

RNA polymerase 

In two pioneering studies of the Deiters group, photocaged ncAA 
have also been used to directly control the activity of RNA poly- 
merases. In two examples, bacteriophage T7 RNA polymerase — 
that exhibits orthogonality in a broad range of prokaryotic and 
eukaryotic organisms — could be inactivated by introducing a sin- 
gle photocaged tyrosine 4 or lysine 5 (Scheme 1). ncAA 4 was 
genetically encoded in E. coli using a TyrRS mutant of the M. 
jannschii tRNA Tyr /TyrRS pair (Deiters et al., 2006) and later in 
eukaryotes using a PylRS mutant of the M. barkeri tRNA^'/PylRS 
pair (Arbely et al, 2012). Inactivation by caged ncAA 4 was 
achieved by substituting an active site tyrosine that plays an essen- 
tial role in NTP-induced transition from the open to the closed 
conformation of T7 RNA polymerase during the catalytic cycle 
of the elongation state (Chou et al., 2010). Decaging resulted in 
restoration of catalytic activity both in vitro as well as in E. coli and 
HEK 293T cells indicated by reporter gene expressions, though 
the absence of orthogonality of the employed M. jannaschii 
TyrRS/tRNA Tyr pair in eukaryotes required the transfection of T7 
RNA polymerase protein in the latter case. 

Very recently, photocaged lysine 5 was employed to control 
T7 RNA polymerase activity in an advanced setup (Figure 3A) 
(Hemphill et al., 2013). Since this ncAA is a substrate of the 
M. barkeri PylRS/tRNA Pyl pair that exhibits orthogonality in all 
domains of life, the approach enabled the direct, intracellular 
expression of the photocaged protein. An active site lysine that 
is critical for catalytic activity because of its ability to recog- 
nize via the a-phosphate the incoming NTPs, was replaced by 
5 (Figure 3B). This rendered the RNA polymerase completely 
inactive in mammalian reporter gene expressions until activa- 
tion by irradiation at 365 nm. The potential of this approach was 
demonstrated by controlling the expression of a shRNA used for 
a knock-out of the Eg5 gene via RNA interference. Eg5 encodes a 
motor protein essential in mitosis as its results in monopolar spin- 
dles and mitotic arrest. This phenotype was successfully induced 
by UV light using the described caging strategy (Figure 3C). 

P0STTRANSLATI0NAL MODIFICATIONS 

Posttranslational modifications greatly expand the chemical 
potential of the proteome and control various critical functions of 
proteins involved in transcription (Walsh et al, 2005). However, 
gaining insights into the precise roles of individual PTMs have 
been hampered by the difficulties to isolate or prepare selectively 
and homogenously modified proteins. These difficulties include 
the reversible and thus often incomplete modification of intra- 
cellular proteins, the presence of multiple modification sites and 
types and the challenges to separate the modified from non- 
modified proteins owing to the fact that often PTMs only promote 
subtle changes in the biophysical properties of the target pro- 
tein (Latham and Dent, 2007). Consequently, there is a need to 
develop methods for the synthesis of homogenous proteins with 
defined PTMs in sufficient amounts for biochemical studies. This 
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FIGURE 3 | Photoactivation of T7 RNA polymerase using 
photocaged lysine 5. (A) General principle of activation of 
transcription of an anti-Eg5 shRNA. (B) Position of lysine used for 
replacement with 5, resulting in an inactive T7 RNA polymerase. 



(C) Photoactivation of trancription of anti-Eg5 shRNA and subsequent 
RNAj -knockout of Eg5, resulting in a binuclear phenotype. Adapted 
with permission from Hemphill et al. (2013). Copyright (2013) 
American Chemical Society. 



has started to be addressed by the direct genetic encoding of a 
variety of modified lysine with critical roles for both histones and 
transcription factors in studies by the groups of Chin, Liu, and 
Schultz(Liu et al., 2011). 

Lysine methylation in histones 

The reversible Ne-methylation of specific lysine residues in his- 
tones (Allfrey et al., 1964) is an important regulatory PTM that 
controls heterochromatin remodeling (Cheung and Lau, 2005). 
Ne-methylation can lead to differential methylation degrees with 
distinct functional roles, i.e. mono-, di-, and trimethylation 
(Taverna et al., 2007), and is orchestrated by histone methylases 
and demethylases. 

A biochemical method to study methylation in vitro and in 
living cells by producing site-specifically Ne-methylated histones 
remained elusive for a long time. Methyltransferases have been 
used, but this approach is hampered by a limited control of 
regioselectivity and methylation extent, and for many sites, the 
respective methyltransferases have yet to be discovered (Martino 
et al, 2009). In contrast, advanced in vitro techniques, such as 
native chemical ligation (He et al., 2003; Chatterjee and Muir, 
2010) or sortase-mediated ligation (Piotukh et al., 2011) have 
succeeded in producing homogeneously methylated histones. 

Only recently, the site-specific incorporation of the mono- 
methylated lysine Ns-methyl-L-lysine 6 (Scheme 1) was achieved 
by means of an expanded genetic code. However, presumably due 
to the structural similarity of lysine and 6, no orthogonal aaRS 



could be evolved for this ncAA to date. Following an alterna- 
tive strategy, Chin and coworkers (Nguyen et al, 2009) were able 
to genetically encode Ns-Boc-Ne-methyl-L-lysine 16 as precur- 
sor by employing the wild type of M. barkeri pyrrolysyl-tRNA 
synthetase. This ncAA could be incorporated into position K9 of 
histone H3, subsequently deprotected with 2% TFA and refolded 
(Scheme 2A). A related strategy that did not require denaturing 
and deprotection conditions was later introduced, based on Ne- 
Allyloxycarbonyl-Ne-methyl-L-lysine 17 (Scheme 2A) (Ai et al, 
2010). This ncAA could be incorporated into position K27 of his- 
tone H2B using a Y384F mutant of M. barkeri PylRS and depro- 
tected with the ruthenium catalyst [Cp*Ru(cod)Cl] under mild 
conditions in an aqueous environment. As another advancement 
offering increased bio-orthogonality and spatiotemporal control 
of deprotection, Schultz and Liu reported the genetic encoding of 
the photocaged ncAA Ne-(o-nitrobenzylcarbamoyl)-Ne-methyl- 
L-lysine 18 using evolved PylRS mutants (Scheme 2A) (Groff 
et al, 2010; Wang et al, 2010b). This ncAA can be deprotected 
by irradiation with UV light at 360 nm under mild conditions, 
which was demonstrated in mammalian cells (Groff et al., 2010). 

In contrast to the various approaches to obtain mono- 
methylated lysine in histones, only one strategy for the genetic 
encoding of Ne, Ne-dimethyl-L-lysine 7 (Scheme 1) has been 
reported. No direct design of aaRS mutants capable of recog- 
nizing 7 or protected derivatives as substrates have been devel- 
oped. Instead, posttranslational dimethylation in combination 
with two orthogonal protection groups was used in this case. 
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SCHEME 2 | Synthetic strategies for the introduction of different lysine 
PTM via genetic code expansion. (A) Introduction of Ne-methyl-L-lysine. 
(B) Introduction of Ne-, Ne-dimethyl-L-lysine. (C) Flexible introduction of 
various PTM in form of L-lysine mimicks. 



First, Ne-Boc-L-lysine was incorporated at the target site K9 in 
histone H3 using wild type M. barkeri PylRS (Scheme 2B). In 
this mutant, all canonical lysine residues were protected with N- 
(benzyloxycarbonyloxy)succinimide (Cbz-OSu) under denatur- 
ing conditions. Subsequently, 7 was selectively deprotected with 
TFA: water (9: 6) at 4°C and dimethylated by reductive methy- 
lation using formaldehyde and dimethylaminoborane (DMAB). 
Finally, Cbz groups were removed selectively, yielding histone H3 
with 7 at position K9 in natural conformation, as indicated by 



blots using anti-H3K9me2 and anti-H3K9mel antibodies and co- 
immunoprecipitation experiments with heterochromatin protein 
1 (HP1). 

Though strategies have been developed for the introduction 
of both 6 and 7, Ne, Ne, Nfi-trimethyl-L-lysine has not yet 
been reported. However, a strategy to introduce amino acid 
modifications via selenocystein-derivatives proposed by Schultz 
(Wang et al., 2007a; Guo et al, 2008) allowed the incorpora- 
tion of the trimethylated lysine mimick 12 (Scheme 1) into 
histone H3 (Wang et al., 2012). Initially described for phenyl- 
L-selenocysteine encoded by a M. jannaschii mutant and later 
for the derivative 19 (Scheme 2C) encoded by a M. mazei PylRS 
mutant, this approach is based on an oxidative elimination that 
results in dehydroalanine 20. This ncAA can be derivatized by 
Michael addition reactions using thiols, resulting in a whole range 
of lysine PTM mimicks (10-13, (Schemes 1), 2C). Drawbacks 
of this methodology are the oxidative conditions of the reaction 
that can affect cysteines and methionines, the changes in the 
flexibility of the lysine linker and the pK a of the Ns-ammonium 
group caused by the sulfur atom. However, the first drawback 
is not relevant for proteins with cysteines and methionines that 
can be mutated without loss of function and the approach offers 
considerable flexibility. 

Lysine acetylation in histones 

Ne-acetylation of lysine in histones is a reversible PTM that is 
catalyzed by histone acetylases and deacetylases and is critically 
involved in chromatin remodeling and thus transcriptional con- 
trol (Jenuwein and Allis, 2001; Kouzarides, 2007). Besides the 
introduction of Ns-acetyl- L-lysine 8 (Scheme 1) by native chem- 
ical ligation (Shogren-Knaak et al., 2006), direct insertion by 
genetic encoding has been reported using an evolved M. bark- 
eri PylRS mutant by Chin and coworkers (Neumann et al., 2008) 
that was further employed for the multiple incorporation of 8 into 
proteins (Huang et al., 2010). This ncAA was introduced (among 
other positions) into position K56 of histone H3 and used for 
FRET (fluorescence resonance energy transfer) studies on recon- 
stituted nucleosomes. These revealed that acetylation of K56 does 
not have a direct, measurable effect on nucleosome stability and 
only moderately affects the activity of the nucleosome remodeling 
complexes Swi/Snf and RSC. However, DNA breathing (i.e. ther- 
mal motions that induce spontaneous openings and re-closings 
of the double helix) that can lead to unwrapping of DNA from 
the nucleosome, was increased by acetylation of K56 (Neumann 
et al, 2009). 

In another study, the impact of acetylation of K16 in his- 
tone 4 on binding of Sir (silent information regulatory) proteins 
and linker DNA accessibility was investigated. This revealed that 
acetylation of K16 decreases the affinity of Sir3 for chromatin and 
affects chromatin structure. In contrast, the Sir2-4 subcomplex 
exhibited increased affinity when K16 was acetylated, suggesting 
a dual role of K16 acetylation, i.e. the recruitment of Sir2-4 and 
the repelling Sir3 (Oppikofer et al., 201 1). 

Lysine crotonylation in histones 

Recent studies have revealed several acyl-groups, including 
crotonyl-, malonyl-, and succinyl-groups as novel lysine PTMs of 
histones (Du et al., 2011; Tan et al, 2011; Zhang et al, 2011b). 
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Ng-Crotonyl-L-lysine 9 (Scheme 1) is mainly found in transcrip- 
tionally active chromatin and differs from Ne-acetyl-L-lysine in 
its regulation and genomic distribution (Tan et al., 2011). The 
genetic encoding of 9 was described by Schultz (Kim et al, 
2012) and later by other groups using PylRS mutants (Gattner 
et al., 2013; Lee et al., 2013). This ncAA was incorporated into 
position Kl 1 of human histone H2B and was recognized by an 
anti-Ns-crotonyl-L-lysine antibody (Kim et al., 2012). 

Lysine acetylation in transcription factors 

Acetylation of lysine, besides its critical role as histone PTM, 
is a widely found PTM in many different classes of proteins, 
including transcription factors (Kim et al., 2006). In this direc- 
tion, Ne-acetyl-L-lysine 8 was used to study the role of K120 
(that is acetylated in response to DNA-damage) in the DNA- 
binding domain (DBD) of the tumor suppressor p53 (Arbely 
et al, 2011). In its non-modified form, p53 is not capable 
to selectively bind to specific response elements under physi- 
ologic salt concentrations in vitro, but rather exhibits random 
binding. However, upon acetylation of K120, binding becomes 
selective under these conditions. Moreover, both p53 and p53 
acetylated at K120 preferentially forms homocomplexes with 
DNA in a 4:1 stoichiometry, respectively, rather than mixed 
complexes containing both p53 modification states. This sug- 
gests that both DBD forms prefer distinct quaternary structures, 
which corroborates previous findings that p53 binds to non- 
specific and specific DNA sequences in differential quaternary 
structures. 

In a second study, the influence of Ns-L-lysine acetylation on 
the activity of transcription factors in E. coli was investigated 
(Thao et al, 2010). A proteome screen for substrates of the Gcn5- 
like protein acetyltransferase (Pat) afforded four transcription 
factors as substrates. One of them (RcsB) was found to be acety- 
lated at position K180, moreover, it was found that this was a 
reversible process since K180 could also be deacetylated by CobB, 
a sirtuin-like protein deacetylase. Electromobility shift assays with 
RcsB bearing 8 revealed that K180 acetylation of RcsB was critical 
for DNA binding, suggesting for the first time that bacteria use 
this PTM to regulate gene expression. 

OTHER ncAA FUNCTIONS 

Beside the three main applications of ncAA described above, a 
number of other ncAA functions have been used in the context 
of transcription factor-DNA complexes in earlier studies. Though 
these studies so far were isolated proof-of-concept experiments, 
the employed ncAA offer a significant potential to be used more 
widely in the field. 

For example, the genetic encoding of the azobenzene- 
containing amino acid 14 (Scheme 1) offers properties for the 
photocontrol of protein functions that complement the use of 
photocaged amino acids (Bose et al, 2006). ncAA 14 can undergo 
a reversible cis-trans photoisomerization: irradiation at 320- 
340 nm converts the more stable trans- to the «s-isomer that 
can re-isomerize thermally or upon irradiation at >420 nm. Both 
isomers differ in geometry and dipole which can be exploited 
to photomodulate the structure and consequently the binding 
affinity and/or activity of a protein. This was demonstrated 



by photomodulating the affinity of E. coli CAP for cAMP and 
consequently the binding to its cognate DNA binding site in the 
lac promoter. 

Another promising advancement has been the genetic encod- 
ing of the bipyridyl-containing ncAA 15 that can serve as a metal 
ion chelator (Xie et al, 2007). This ncAA, when chelating Fe(II) or 
Cu(II) in the presence of a reducing agent, can trigger the oxida- 
tive cleavage of the DNA backbone and thus it is a useful probe to 
map protein DNA interactions, though with limited potential for 
intracellular applications (Lee and Schultz, 2008). 

Finally, dimerization events upon DNA-binding of a Gcn4 
bZIP protein were monitored by quenching intrinsic trypto- 
phan fluorescence in the protein using the ncAA p-nitro-L- 
phenylalanine (Tsao et al., 2006). 

CONCLUSIONS AND OUTLOOK 

The here reviewed studies demonstrate the considerable poten- 
tial of genetic code expansion to provide detailed insights into the 
molecular mechanisms that underlie transcription and its regu- 
lation. These insights are as diverse as the chemical functions of 
the employed ncAA, and in many cases could not have been pro- 
vided by traditional approaches of molecular biology. In respect 
to the main ncAA functions discussed here, several aspects for 
future improvements can be envisaged. Photocrosslinking ncAA 
could be developed with additional, complementary chemose- 
lectivities that could provide an additional layer of information 
into the protein complexes under study, besides the sole vicinity 
of interaction partners usually provided by non-specific pho- 
tocrosslinkers. Moreover, structural aspects of photocrosslinking 
ncAA can be considered: In view of high success-rates for the dis- 
covery of unknown interaction partners, wide-range reactivity of 
photocrosslinkers with long, flexible linkers is desirable (Chou 
et al, 2011; Zhang et al., 2011a; Yanagisawa et al, 2012; Schmidt 
and Summerer, 2013). However, the mapping of interaction sur- 
faces of known complexes with increased resolution would benefit 
from small, rather rigid crosslinkers. In both cases, perturbation 
of binding has to be minimal. Though the applicability of both 
photocrosslinking and photoactivatable ncAA using UV-light for 
activation has been thoroughly proven, the use of red light with a 
wider range of such ncAA would open new perspectives for the 
studying of complex samples, such as multicellular organisms. 
Additionally, ncAA functions that are explicitly nucleic acid- 
directed (e.g., nucleolytic backbone cleavers) and are applicable 
in vivo could lead to interesting tools to study protein-nucleic 
acid interactions involved in transcription. Finally, since currently 
encodable PTMs represent only a subset of the ones found in 
nature, the genetic encoding of more PTMs would be highly 
attractive. 

From such methodological advancements, in combination 
with the growing number of organisms with expandable genetic 
code, a multitude of insights into the biological functions of pro- 
teins in their native, intracellular environment can be expected. 
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